Analysis of Intel's Haswell Microarchitecture Using the ECM Model and Microbenchmarks

نویسندگان

  • Johannes Hofmann
  • Dietmar Fey
  • Jan Eitzinger
  • Georg Hager
  • Gerhard Wellein
چکیده

This paper presents an in-depth analysis of Intel’s Haswell microarchitecture for streaming loop kernels. Among the new features examined is the dual-ring Uncore design, Cluster-on-Die mode, Uncore Frequency Scaling, core improvements as new and improved execution units, as well as improvements throughout the memory hierarchy. The Execution-Cache-Memory diagnostic performance model is used together with a generic set of microbenchmarks to quantify the efficiency of the microarchitecture. The set of microbenchmarks is chosen such that it can serve as a blueprint for other streaming loop kernels.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Evaluation of Intel's Restricted Transactional Memory for CPAs

With the release of their latest processor microarchitecture, codenamed Haswell, Intel added new Transactional Synchronization Extensions (TSX) to their processors’ instruction set. These extensions include support for Restricted Transactional Memory (RTM), a programming model in which arbitrary sized units of memory can be read and written in an atomic manner. This paper describes the low-leve...

متن کامل

An Analysis of Core- and Chip-Level Architectural Features in Four Generations of Intel Server Processors

This paper presents a survey of architectural features among four generations of Intel server processors (Sandy Bridge, Ivy Bridge, Haswell, and Broadwell) with a focus on performance with floating point workloads. Starting on the core level and going down the memory hierarchy we cover instruction throughput for floating-point instructions, L1 cache, address generation capabilities, core clock ...

متن کامل

Execution-Cache-Memory Performance Model: Introduction and Validation

This report serves two purposes: To introduce and validate the Execution-Cache-Memory (ECM) performance model and to provide a thorough analysis of current Intel processor architectures with a special emphasis on Intel Xeon HaswellEP. The ECM model is a simple analytical performance model which focuses on basic architectural resources. The architectural analysis and model predictions are showca...

متن کامل

A Performance Evaluation of the Nehalem Quad-Core Processor for Scientific Computing

In this work we present an initial performance evaluation of Intel's latest, secondgeneration quad-core processor, Nehalem, and provide a comparison to first-generation AMD and Intel quad-core processors Barcelona and Tigerton. Nehalem is the first Intel processor to implement a NUMA architecture incorporating QuickPath Interconnect for interconnecting processors within a node, and the first to...

متن کامل

Kummer Strikes Back: New DH Speed Records

This paper introduces high-security constant-time variable-base-point Diffie–Hellman software using just 274593 Cortex-A8 cycles, 91460 Sandy Bridge cycles, 90896 Ivy Bridge cycles, or 72220 Haswell cycles. The only higher speed appearing in the literature for any of these platforms is a claim of 60000 Haswell cycles for unpublished software performing arithmetic on a binary elliptic curve. The...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016